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Abstract 

This technical note considers the problems of blind sparse learning and inference of electrogram 
(EGM) signals under atrial fibrillation (AF) conditions. First of all we introduce a mathematical model 
for the observed signals that takes into account the multiple foci typically appearing inside the heart 
during AF. Then we propose a reconstruction model based on a fixed dictionary and discuss several 
alternatives for choosing the dictionary. In order to obtain a sparse solution that takes into account the 
biological restrictions of the problem, a first alternative is using LASSO regularization followed by a 
post-processing stage that removes low amplitude coefficients violating the refractory period characteristic 
of cardiac cells. As an alternative we propose a novel regularization term, called cross products LASSO 
(CP-LASSO), that is able to incorporate the biological constraints directly into the optimization problem. 
Unfortunately, the resulting problem is non-convex, but we show how it can be solved efficiently in an 
approximated way making use of successive convex approximations (SCA). Finally, spectral analysis is 
performed on the clean activation sequence obtained from the sparse learning stage in order to estimate 
the number of latent foci and their frequencies. Simulations on synthetic and real data are provided to 
validate the proposed approach. 
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I. Introduction 

The clinical term atrial fibrillation (AF) refers to a family of common heart disorders characterized by 
fast and uncoordinated activations in the atrium. The mechanisms causing the initiation and maintenance 
of AF comprise a set of heterogenous interactions at different levels (cells, tissues and the whole heart) 
changing along time and resulting into different states of AF [ Nattel et al.[ 2000[ [Everett and Olgin 



2004 Nattel et al. 2005 1 . Several theories about the physiological causes underlying AF initiation and 



maintenance have been formulated over the last 50 years | Nattel[ 2002| . One of the most prominent 
hypothesis considers multiple uncoordinated activation foci placed at different locations inside the atrium. 
These fast and asynchronous activations cause a disordered global electrical activity that contributes to 
AF maintenance. |Krummen and Narayan[ 2009| . In contrast, during normal heart operation conditions 
(sinus rhythm) we observe a single activation focus, placed at the sinus node, acting as a pacemaker for 
the whole heart and leading to a regular global electrical activity. 

In order to understand the pathophysiology of AF, dominant frequency analysis (DFA) has been 
traditionally used to analyze the data collected from electrocardiograms (ECGs) or electrograms (EGMs). 
DFA is useful for identifying the areas corresponding to the highest activation frequencies that may be the 



drivers maintaining AF, and therefore the targets of ablation therapy for AF termination | Sanders et al. 



2005 [ . However, DFA provides very limited information about the signal's structure, since it is based 
on the implicit assumption that the underlying signal consists of a single quasi-periodic component plus 



an irregular component [Barquero- Perez et al.[ 2010 1. Hence, the only spectral parameter required is the 



dominant frequency (DF), which tries to characterize the periodicity of the signal, but is very sensitive 



to distortions and often provides misleading information | |Ng et al^ |2007[ . 

More recently, organization analysis techniques have been introduced, and additional parameters, such 
as the regularity index (RI) and the organization index (01), have been used to describe the signals 



[Fischer et al.[ 2007 Barquero-Perez et al. 2010 1. Many other linear and non-linear measures have been 



proposed for the characterization of AF [ [Mainardi et al.[ [200 1[ [Nguyen et al.[ [2010| | : the cross-correlation 
index, the non-linear association measure, the fractionation index, etc. However, all of them are still based 
on the same implicit assumption: the observed signals can be modelled by a single regular component 
plus distortion and noise. 
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In this technical report we summarize the formulation introduced in |Monz6n et al. 2012 1 and introduce 
a novel formulation based on a new sparse regularization term that incorporates the biological restrictions 



imposed by the refractory period of cardiac cells |Luengo et al. 2013 1. Overall, in these two papers we 
make two main contributions. First of all, we introduce a more realistic mathematical model that takes 
into account the multiple activation foci, and use it to perform spectral analysis, detecting the number 
of foci and their frequencies. And secondly, recognizing the sparse nature of the recorded signals, we 
apply a sparsity-aware learning technique, based on LASSO, to obtain an activation sequence on which 



the spectral analysis is performed. In |Monz6n et al. 2012 1 this is followed by an additional stage that 
gets rid of spikes that violate the biological restrictions, whereas in [ Luengo et al.[ [20T3 | we include 
this term inside the regularization, obtaining a novel regularization term, called cross-products LASSO 
(CP-LASSO), since it is based on cross-products of coefficients associated to different time instants in 
the reconstruction model, that we add to the Li norm regularization term introduced by LASSO. 

In the sequel we use bipolar intracardiac electrograms (EGMs), obtained placing a set of electrodes 



in direct contact with the heart muscle during heart surgery |Ng and Goldberger} 2007 Sanders et al. 



2005[ . The resulting signal processing algorithm applied to the signals consists of four steps: 



1) Pre-processing to eliminate potential artifacts, especially outside of the frequency range of interest. 

2) Inferring the spike trains associated to the activation times using a sparsity-aware learning technique 
based on LASSO [Tibshirani 1996 1, plus a later stage to ensure that biological restrictions are met 
[ Monzon et al.[ 2012 1, or on CP-LASSO nLuengo et al] [20T3l without any additional stage. 

3) Sparse spectral analysis of that activation sequence, using an iterative deflation approach to detect 
the number of foci and their frequencies. 

4) Post-processing in order to eliminate harmonics and subharmonics. 

The report is structured as follows. First of all, in Section |ll] we briefly review the prevalent approach 



for the analysis of EGMs: dominant frequency analysis. Then, in Section III we describe the problem 
formulation used throughout the paper, showing the novel mathematical model (based on a set of 
unobserved latent signals) proposed for describing the recorded EGMs, the sparsity-aware formulation 
introduced for solving it, and several dictionaries considered for modelling the unknown latent signals. 
Section |V] describes the approach proposed in | Monzon et al. 2012 1 for inferring the sparse activations: 
a sparsity-aware learning technique based on LASSO, plus a second stage to incorporate the biological 



constraints. The alternative formulation proposed in | Luengo et al. 2013 1, based on adding a new 
regularization term (CP-LASSO) to the sparse learning problem that takes into account the biological 
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constraints, is described in Section VI Unfortunately, this new regularization term leads to a non-convex 
optimization problem, so we have to look for methods that are available to produce approximate solutions 
in a reasonable computational time. The method chosen, successive convex approximations (SCA), is also 
described in this section. Then, Section |VII shows how the sparse spike train inferred using either of 
these two approaches can be used to perform sparse spectral analysis (SSA), thus inferring the number 
of latent foci as well as their activation frequencies. Finally, the conclusions and future lines close the 



paper in Section VIII 



II. Background 

A. Dominant Frequency Analysis 

Dominant frequency analysis (DFA) is the prevalent approach for the analysis of EGMs. DFA 
assumes implicitly that the observed signals are composed of a single regular component (i.e., a quasi- 
periodic signal) plus an irregular component including the remaining noise and distortion. Hence, from a 
mathematical point of view, the g-th output (EGM), 1 < q < Q with Q denoting the number of outputs. 



can be modelled as [Fischer et al.| 2007 1 



yq{i) = ^ (t)q{t-kTq-fq) + Wq{t), 



(1) 



where (j)q{t) indicates the average shape of the regular component of the signal, with Tq denoting its 
period and fq the delay for A; = 0, and Wq{t) is used to represent the irregular components. The goal of 
DFA is characterizing that quasi-periodic signal through its average period, Tq, or equivalently its average 
frequency, fq = 1/Tq, which is the so called dominant frequency (DF). Occasionally other parameters, 
such as the organization or the regularity indexes, are also obtained to determine whether the estimated 



DF is reliable or not |Barquero-Perez et al.[ |2010[ [Fischer et al.[ |2007[ . 

The DF is usually obtained separately for each channel using standard spectral analysis techniques. 



The typical signal processing approach includes the five steps for each EGM [Fischer et al.[ |2007[ shown 
in Algorithm [T] Several segments can be averaged in order to improve the estimation of the dominant 
frequency. However, the ability of the DF to reflect the average atrial activation rate depends on the 
accuracy of ([T]) in representing the true observed signal. Unfortunately, several characteristics of atrial 
activation, such as the complexity of the electrogram morphology, can alter the power spectrum. In these 
cases, the DF, fq, is often more related to the complexity of the signal than to the atrial activation rate, 
thus providing misleading information | Ng et al^ [2007 1. 
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Algorithm 1 Dominant frequency analysis (DFA) for the q-th signal. 

1. Band-Pass filtering from 30 Hz to 400 Hz. 

2. Rectification of the resulting signal, recovering near direct current (DC) spectral components. 

3. Low-Pass filtering with a cut-off frequency of 15 Hz. 

4. Computation of the spectrum using a locahzed Fast Fourier Transform (FFT) with a Hanning window 
of A = 4 s duration, resulting in a resolution /a = 1/A = 0.25 Hz in the frequency domain. 

5. Search for the peak with the maximum amplitude in the frequency domain. The frequency associated 
to this peak is the dominant frequency (DF) of the q-th EGM, fg. 



III. Problem Formulation 

In this section we show the novel problem formulation proposed as an alternative to the DFA 
formulation shown in the previous section. First of all, we introduce a more realistic mathematical model 
based on the assumption that the observed signals are the result of several unobserved latent functions 
(the unknown activation foci that we want to estimate) propagating through the heart. Then, since the 
real shapes of these latent signals are not precisely known, we introduce a sparsity-aware formulation to 
solve the problem based on an overcomplete dictionary. 



A. Signal Model 

In this technical report we focus on the analysis of electrograms, although the proposed approach can 
also be applied to other types of signals, as shown in | Luengo et al. 2013[ . Our basic assumption is that 
the recorded EGMs are composed of the sum of several periodic or quasi-periodic signals plus distortion 
and noise. Each of these observed periodic signals are the result of a set of sparse activation foci (spike 
trains) that propagate through the atrium and reach the sensors. Hence, these unobserved activations play 
the role of latent signals, providing a principled way of describing the correlation between the outputs. 
Our primary goal here is detecting the number of activation foci, as well as their frequencies. 

From a mathematical point of view, let us consider a model with Q correlated outputs, yq{t), obtained 
from a set of bipolar electrodes. These observations are generated by R activation foci (latent signals) 
propagating inside the atrium, plus noise and interference. Hence, we model the output of the q-th channel 
(1 < g < Q) as 

R 

Vqi^) = ^Prit) * hr,q{t) + Wq{t), (2) 



r=l 
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where Pr{t) (1 < r < R) denotes the r-th foci, Wq{t) models all the elements in the q-th output that cannot 
be explained by the model (i.e., noise, interferences and distortion), hr^g{t) is the impulse response of 
the channel between the r-th foci and the q-th output EGM and * denotes the standard linear convolution 
operator{^ Since we are not interested in recovering the precise shape of the activations, but only in their 
number and frequencies, we model them as periodic spike trains]^ 

oo 

Prit)= Ar,q[k]6{t-kTr-Tr), (3) 

fc=— oo 

with 5{t) denoting Dirac's delta, Tr = l/fr the average period of the r-th spike train (with fr denoting 
its associated average frequency) and r,. its shift w.r.t. the origin (0 < < T^)]^ Finally, substituting Q 
into ([2]), the g-th output becomes 

R oo 

y'}it) = Y. Ar,q['^]hrAt-kTr-Tr)+Wg{t). (4) 

r=l fc=~oo 

The discrete-time version of this model, obtained assuming a uniform sampling frequency, fs = l/Ts = 
977 Hz0 would be 

R oo 

ygN = VqinTs) = X] ^r-,g[/c]/ir,g[?^, k] + Wg[n\, (5) 

r=l fc=— oo 

'Note that hr.q{t) includes the response of the sensor and can be slowly time- varying. However, since the sparse learning 
and the subsequent spectral analysis are performed using short time windows, we can consider the channel to be time-invariant 
in practice. 

^Note that this is not a limitation, since we can always include the shape of the activations inside the channel's impulse 
response, hr.qit). We also remark that the amplitude term, Ar.q[k], was not present in the MLSP formulation |Monz6n et al. 
|2012| . However, we include it here since it allows us to take into account effects such as the amplitude modulation often 
observed in EGMs or the fact that some activations may not actually be observed (due to blocking phenomena inside the heart, 
the refractory period of cardiac cells or some other factor). 

'Let us remark that only a reduced frequency range is meaningful from a physiological point of view. On the one hand, for 
sinus rhythm the heart rate can vary between 30 beats per minute (bpm) and 120 bpm with a typical range of 50-100 bpm, i.e. 
the range of valid frequencies is 0.5 < /,. < 2 Hz or equivalently 0.5 < T,. < 2 s, with typical ranges 5/6 < /r < 5/3 Hz or 
equivalently 0.6 < Tr < 1.2 s. On the other hand, when we analyze EGMs measured during atrial fibrillation (AF), atrial cells 
can fire at rates of 120-600 bpm (with a typical range of 400-600 bpm) |Nattel[ |2002| , leading to a useful frequency range 
2 < /,. < 10 Hz or equivalently 0.1 < < 0.5 s with typical ranges 20/3 < /r < 10 Hz or equivalenfly 0.1 < < 0.15 s. 
Hence, those will be the ranges considered in the sequel: 0.5 < /r < 2 Hz for sinus rythm and 2 < /,■ < 10 Hz for AF. 

''since the sampling frequency {fs = 977 Hz) is very large compared to the frequencies of interest (fr < 10 Hz), for 
simulation purposes we often apply a decimation to the EGM signals, thus obtaining a final sampling frequency fs = fa/L 
with L £ {1,2,3,4} (i.e., 244.25 < /s(Hz) < 977). This allows us to reduce the computational cost of the signal processing 
algorithms applied without compromising their performance. 
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where hr^q[n,k] = hr^q{nTs — kTr — r,.) is the discrete-time equivalent channel and Wq[n] = Wq{nTs) 
are the noise plus distortion and interference samples at the sampling instants]^ In the sequel we make 
use of this discrete-time model, focusing on inferring the global spike train (i.e., the spike train resulting 
from the sum of the R foci), and using it to estimate R and = l/T^ for r = 1, . . . , R. 

B. Reconstruction Model Based on an Overcomplete Dictionary 

Let us denote the {N + 1) x 1 vector with the samples from the g-th EGM by = 
[yg[0]) • • • 1 yq[^]V ' with yq[n] = yq{nTs) obtained sampUng yq{t) uniformly with a sampUng 

frequency fs = '^/Ts Hz. Now, let us define the iV x 1 vector containing the discrete-time differentiation 
of the g-th output, Zg = [zq[l\, Zq[2\, . . . , Zg[Af]]^ with Zq[n] = yq[n] — yq[n — 1] for 1 < n < A^. 
Since we are not interested in the precise shape of the activations, and the number of latent foci is still 
unknown, we approximate Zq[n] by a mixture of shifted smooth generic curves]^ 

M 

^qi'^] = X] /^'".^W * Gm[n] + OqEqln] 

m=l 
M N 

= ^ ^ Pm,q [k] Gm [n - k] + GqEq [u] , (7) 
m=l k=l 

where eq[n\ is additive white Gaussian noise (AWGN) with zero-mean and unit variance (i.e., eq[n] ~ 
M{0, 1)), aq denotes the actual noise variance, assumed to be known or estimated from the data, and 
I3m,q[k] is the coefficient of the q-th output associated to the A;-th shift of the m-th activation shape, 
Gm{t), for 1 < m < M, 1 < q < Q and 1 < k < N. Note that this model is similar to the discrete-time 
equivalent model assumed for the data, given by Q, and results in the following equivalent continuous- 
time model: 

M 

Zq{t) = (^rn,qit) * Gm{t) + CTqEqit) . (8) 

m=l 

^Note that, due to the discretization, the discrete-time equivalent channel may be time-varying even when hr,q{t) is time- 
invariant. This is due to a fractional sampling effect, caused by the fact that 

and the samples associated to different time-shifts of the channel will not coincide whenever Tr/Ts is not an integer number. 
Hence, the discrete-time equivalent channel, hr,q[n, k], can indeed be time-varying even when the underlying continuous-time 
channel, hr^q{nTs — kTr — Tr), is time-invariant. However, if we assume that T^ <C max,. T,, (i.e., fs 3> rnax^ /r), as it occurs 
in this case, where we have fs > 244.25 Hz and fr < 10 Hz, this effect will be small and we may ignore it. 
*A detailed analysis of the limits for the convolution in (|7} can be seen in the Appendix. 
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Indeed, the models assumed by the sparsity-aware formulation, given by ^ and ([8]l, are very similar to 
the assumed underlying models, given by Q and Q, although there are two important differences: 

1) Focusing on the discrete-time models, we notice that they describe the first-order time-difference of 
the sampled EGM signals instead of the signals themselves. Regarding the equivalent continuous- 
time models, this is akin to working with the first derivative of the signals (which is related to 
the time-difference in the limit) instead of the signals. This is a common approach to remove the 
baseline of the signals. 

2) Since the number of activations and their shapes (i.e., the impulse responses associated to the Q 
channels) are unknown, we use a set of M ^ R activations constructed using generic smooth 
curves, Gm{t)- Note that the same activation shapes are used for all the channels, and we let them 
select which activations are actually relevant in each case through a sparse learning process based 
on LASSO or CP-LASSO. This allows us to effectively remove the subindex q from the original 
activations, hj.^q{t), moving it to the set of coefficients, Pm,q{'t)- 

IV. OVERCOMPLETE DICTIONARIES FOR SPARSE LEARNING 

In this section we describe several possible choices for the elements of the overcomplete dictionary 
used in the reconstruction model. 



A. Gaussian Dictionary 

In | |Monz6n et al.[ |2012[ , the activation shapes were modelled as samples from truncated and time- 



shifted Gaussian functions]^ 

(/>(^)(t) = G^{t) = ^ exp --^ for -Tm<t<Tm, (9) 

with Tra = NmTs a user-defined threshold (set up in practice so that Gm(±7"m) is close to zero), 
and cr^ a finite set of M > i? user-defined variances with af < o"2 < ... < Let us define 

Tmax = max T„, = Tm and iVmax = max Nm = Nm, with Nm = L^m/^sJ and [x\ denoting the 

l<m<A/ l<m<M 

integer part of the real number x. Now, from the continuous-time activation shape, Gm{t) with support 
—Tm < i < 7Af0we can construct the discrete-time activation elements through uniform sampling with 



^Note that we consider an energy-normalized Gaussian instead of the standard unnormalized Gaussian used in 
20I2| . The derivation of the normalized Gaussian can be seen in the Appendix. 



Monzon et al. 



We have to consider the largest support for all the activation shapes in the dictionary, even though we know that Gm {t) = 

for \t\ > Tm. 
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a period Tg = 1/ fs and time-shifting by Nm samples, i.e., 

(/.(^)[n] = G^[n] = G„((n - Nm)Ts) = G„((n - [Tm/Ts\)Ts). 



(10) 



Hence, all the discrete-time activation elements suffer a delay of Nm samples (i.e., NmTs seconds) that 
must be taken into account when interpreting the results obtained. 

Now we can rewrite the sparse model in (|7]) more compactly in matrix form by defining a set of x M 
matrices, for 1 <k < N, such that their (n, m)-th element is ^^kin, m) = (t)"m[n — k] = Gm[n — k] 
for 1 < n < N and 1 < m < M, i.e., 

Gi[l-k] G2[l-k] ■■■ GM[l-k] 
Gi[2-k] G2[2-k] ■■■ GM[2-k] 



Gi[n-k] G2[n-k] 



Guin - k] 



(11) 



Gi[N-k] G2[N-k] ■■■ GM[N-k] 
Concatenating all these matrices we obtain an overcomplete global dictionary (note that we have MN 
dictionary elements and only < MN samples) that can be collected in the following x MN matrix]^ 



Ll ^ Z) • • • 

Gi[0] 

Gi[n-1] 

Gi[N-l] 

Gi[0] 
Gi[l] 

Gi[n-1] 




Gm[0] 
Gm[1] 



Gi[-l] 
Gi[0] 



GM[n-l] Gi[n-2] 

GA/[iV-l] Gi[N-2] 

Gm[0] 
Gm[1] Gi[0] ■ 

GM[n-l] Gi[n-2] • 











• Gm[-1] 
■ Gm[0] 

• GM[n-2] 

• Gm[N-2] 



Gm[0] ■■ 
GM[n-2] ■■ 




Gi[l-N] ••• Gm[1-N] 
Gi[2-iV] ••• Gm[2-N] 

Gi[n-N] ••• GM[n-N] 



Gi[0] 



Gm[0] 








Gi[0] 



Gm[0] 



(12) 



'Note that, due to the use of truncated and time-shifted Gaussians, Gm[n] — whenever n < or n > 2Nm.- Hence, many 
elements in (1 < < A^), and thus also in will actually be zero, as sketched in l |12[ >. 
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where we have assumed that N » 2Nm in the last expression, as is usually the case in practice. Now, 
using ( [72] ), Q can be expressed in a completely equivalent way as 



(13) 



where Sq = [£q[l], £^[2], . . . , eg[A^]]^ is an x 1 column vector with the noise samples associated to 
each sample of Zq[n\, and (3g is an MN x 1 column vector composed of N subvectors of size M: 



Pg[k] = WiAk], f^M^kW, l<k<N. 



(14) 
(15) 



Finally, note that this dictionary is not fitted to detect activation times close to the initial boundary 
of the signal (i.e., n = 1). However, this issue can be easily circumvented by adding 2Nm zeros to 
the signal to be processed before 2:[l]{^This would result in an extended support for the sequence, 
— {2Nm — 1) <n < N, an extended discrete-time differentiation vector. 



Xq[n] = [0^_^, Zq[l], . Zq[N] ] 
2Nm zeros N samples 



(16) 



an extended coefficients vector. 



with /3(r[/!;] still given by ( [T5| ) for —{2Nm — ^) < k < N, and an extended dictionary matrix, 



with 



* = [*_(2JV„-1), *-l, *0, *1, *2, *7V], 



(17) 



(18) 



Gi[-i2NM - 1) - k] G2[-i2NM - I) - k] 



Gi[0-k] 
Gi[l-k] 



Gi[N-k] 
again for -{2Nm -1) <k< N. 



G2[0-k] 
G2[l-k] 

G2[N-k] 



GM[-mM-l)-k] 



GM[0-k] 
GM[l-k] 



GM[N-k] 



(19) 



An alternative way of avoiding this problem is by guaranteeing that no activation is present in the observations inside the 
first Nm samples. 
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B. Ideal Dictionary 

The optimum dictionary would in fact be composed of a set of Q dictionaries tailored to the 
characteristics of each of the Q outputs. More specifically, the dictionary for the q-th output would 
be given by the following N x RN matrix. 



*9 = [*l,g, ^2,q, 



(20) 



where ^k,q is the N x R matrix composed of the R impulse responses between the r-th latent signal 
(1 < r < R) and the q-th observation, i.e., 

hl,q[^-k] h2,q[l-k] ■■■ hR,g[l-k] 
hl,g[2-k] h2,q[2-k] ■■■ hR^q[2-k] 



k,q 



hi^q[n — k] h2,q[n — k] 



hi,g[N-k] h2,g[N-k] ■ 
Using this dictionary, ^ can be expressed now as 

Z<7 = ^qPg + <yq£q, 



hR,q[n - k] 



hR,qW-k] 



(21) 



(22) 



where /3„ still has the structure described in (14), with f3g[k] = [l3i^g[k], . . . , /3_r,<j[A;]] . This dictionary 



would lead to the sparsest possible solution, consisting in approximately NQ^^^^Ts/Tj- non-zero 
elements that will coincide with the amplitudes of the activations, ^r,g[/i;]- 



C. Alternative Dictionaries 

Unfortunately, the ideal dictionary discussed in the previous section requires either knowledge of the 
hr,g{t) or a reliable estimation, something which is not easy for the application considered. However, 
it provides us with a criterion for comparing different dictionaries: the best dictionary will be the one 
that attains the sparsest representation (thus allowing us to get closer to the lower bound provided by the 
unavailable ideal dictionary), while obtaining a good reconstruction error (e.g., ensuring that the L2 norm 
of the reconstruction error is below a given threshold). We are currently considering better dictionaries 
using wavelets or wavelet packets, and even activations extracted from real data, since this should lead to 



sparser solutions with a good reconstruction error. As an example, in |Luengo et al. 2013 1 we consider 
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the Mexican hat wavelet, also known as Ricker waveletp^ 

= '^"W = - '^="> (-4) ■ 

which is the negative normalized second derivative of a Gaussian function, and is used due to its similarity 
to activations observed in real data Pi 



V. Indirect Sparse Solution: LASSO plus Post-Processing 



In IMonzon et al. 2012 1 we obtained a sparse vector of coefficients for ( [T3] ), P^, following a two step 



procedure, which is a variation of the algorithm introduced in [Trigano et al.[ [2011 1 : an initial sparse 



solution obtained by applying a LASSO regularization is followed by a greedy procedure for selecting 
only the largest coefficients that respect the biological constraints. 

In order to obtain a sparse regressor, from which the information on the arrival times can be retrieved. 



we estimate jd^ initially by means of LASSO ITibshirani 1996 1. Namely, l3^\Xq) is given by 



^q\Xq) = argmin|-^||zg-A/3 + 11/3 llil, (24) 

where ||/3g||i denotes the Li norm of and indicates the trade-off between sparsity and estimation 
precision: the higher the value of Xg the more emphasis will be placed on obtaining a sparse solution, 
although at the expense of an increased quadratic error in the approximation^^ 

However, in order to obtain an even sparser representation that takes into account the physiological 
restrictions imposed on the signals, we introduce an additional step after the computation of f3^^{Xq). 
The samples associated to the arrival times of the spikes are estimated recursively as follows: 

nfc,g = argmax|||/3j"[n]||il(r/g < ||^g'[n]||i < ||^^' [n^.i,,] 

n=l,...,N ^ J 

s.t. |nfc,(? - "-£,9! > ^min, for £ = 1, . . . , /c - 1, (25) 
"The derivation of the mexican hat wavelet can be seen in the Appendix. 

'"We remark that both the Gaussian and the mexican hat belong to the class of Hermitian wavelets, so called because the 
amplitude of the i?-th Hermitian wavelet depends on the i?-th order Hermite polynomial. Indeed, the superscript in (jim' {t) indicates 
the order of the wavelet (zero for the Gaussian and two for the mexican hat). Further details can be seen in the Appendix. 
i3t„ \a — „A — * „i 2012 , \q — 10~* was used for the sinus rhythm simulations and A, — 10^" for the AF simulations. 



'In 



Monzon et al. 



Note that having a smaller value of A, for AF implies that a less sparse solution will be obtained for AF in comparison to sinus 
rhythm. Both values of \q were obtained through an exhaustive search using real data. 
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where I(-) is an indicator function, i.e., a function that takes a value equal to one if the logical condition 
is fulfilled and zero otherwise, 



Kv,<0^,'[n]\\i<0^,'[n,-i,,]\\i] 



1, Vq<0q'M\l<0g"[nk-l,q]\\i; 

(26) 

0, otherwise, 



and riq and N^i^ are user-defined thresholds. The first one, r]g, is used to discard the [n] with a 
small Li norm, which contribute to improve the signal reconstruction but provide little information on 
the localization of the spikes. We have found out empirically that choosing rjg = Sa^ provides good 
results|^ The second one, N^i^, accounts for the fact that consecutive pulses cannot overlap. Thus, in 
practice A^^min is chosen in such a way that Nynin/ fs ~ 100 ms (i.e., A'^tnin ~ /s/10), which is a standard 
value for the refractory period{^ The final procedure used in practice to implement p5| ) for the q-th 
channel is an iterative greedy approach that follows the steps shown in Algorithm [2] 

Algorithm 2 Iterative greedy approach for selection of the final spikes for the q-th signal. 

1. Initialization: set A; = 1 and /3max = oo. 

2. If /3max > llq: 

2.1. Select the index corresponding to the largest coefficient: 

nk,q = argmax ||^f"[n]||i. (27) 

n=l,...,N 

2.2. If 0q'[hk,q]\\i < Vg, then END. 

2.3 Otherwise, check whether \nk,q — n£^q\ > A^min for £ = 1, . . . , k — l.lf this condition is fulfilled, 

store hk,q and ^q'[hk,q], set /3max = ' [nfcj k = k + 1, (3q'[hk,q] = 

2.4 Return to step 2.1. 



Following this procedure we obtain a set of P arrival times and their associated amplitudes |^ that we 



Monzon et al. 



2012 



aq — 2-10^'^ was used for the simulations performed under induced sinus rythm, and aq — 5- 10~* 
for the atrial fibrillation (AF) simulations. Note that, as aq represents the unexplained variance in the reconstruction model (i.e., 
the energy of the reconstruction error), using a lower value for AF implies allowing less reconstruction error, which in turn 
results in a less sparse solution. In both cases, these values were obtained through an exhaustive search using real data. 

'^Since fs = 977 Hz, taking the integer part of /s/10 we obtain A^min = 97 when no decimation is applied (i.e., L = 1), 
iV„,i„ = 48 for L = 2, TV^in = 32 for L = 3 and N^in = 24 for L = 4. 

"'Note that this procedure can also be used in practice to discard noisy channels that contain no valid information. Since 
Pmin ~ max'^°T ' ^® automatically discard those channels with P < Pmin as invalid, as the activations in those channels 
will correspond to occasional large noise samples. 
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may use to construct an activation sequence (also called spike train) composed of Kronecker deltas at 
the locations of the activations 

p 

TTgN ='^S[n- flk^q]. (28) 



k=l 



This sequence was used in | Monz6n et aL[ 2012 1 to perform a spectral analysis of the clean signal given 
by ( |28| ), since it allows us to get rid of the effect of the unknown channels, hr^q{t), and the particular 
dictionary used, given by Gm (t) ■ Alternatively, we may construct this spike train taking into account the 

amplitudes associated to each coefficient, i.e., 

p 

^g[n]=^0q'[hk,g]\\l^[n-hk,q]. (29) 
k=l 

Whether this will provide useful information for the spectral analysis or not remains an open question. 

VI. Direct Sparse Solution: Cross-Products LASSO 

A. One-Step Sparsity-Aware Formulation 

Instead of following a standard sparse regression initially using LASSO, as given by (|24]), and then 
having to perform a further post-processing stage to take into account the biological restrictions of the 
problem using ( [25] ), we would like to include the problem's constraints into the sparse formulation. This 
would provide us with a more elegant formulation, potentially allowing us to obtain a better solution to 
the problem and in a more efficient way also. Note that the cost function used by LASSO is 

1 2 

^LASSO = ^ ||zg - + A, (30) 

i.e., it is composed of the least squares (LS) error between the model, A/3g, and the data, Zg, plus a 
regularization term, -Rlasso = Ag ||/9g||i, that enforces a sparsity-aware solution. On the one hand, the 



first constraint imposed by the post-processing stage in |Monz6n et al. 2012 1, having |/3q[n]| > r]q, can be 
accommodated by selecting a value of Xq large enough, so it does not require any modification in the cost 
function. On the other hand, the second constraint imposed is related to the refractory period associated 
to cardiac cells, and requires two coefficients, /3^,g[n] and Pm,q[n + A;] for 1 < m < M, to be zero 
when the distance between the centers of their associated activation shapes is less than A'^min = L/s/lOj- 
In order to incorporate this restriction to the cost function, we need to add a new regularization term to 

"Note that we have not estimated R yet. Hence, we cannot separate the contribution of each foci to ([28) as we did in the 
original model, given by (|4j. 
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(30 1 that takes into account this distance restriction. This can be done using the Lq "norm" and results 
in the following modified cost function 

Jexact = ^||zg-A/3j2 + Ag||/3,||i + p,J] J] ^ ||/3™, Jn]/3Jn + /c] ||o, (31) 

n=l m=l k=—N„iiii 

where pq is an additional regularization parameter, || • ||o denotes the Lq "norm" of a vector (i.e., the 
number of non-zero elements), Pm^qin] is the coefficient associated to the m-th activation shape of the 
q-th channel centered around the n-th sample, (3q[n + k] = [I3i^q[n + k], . . . , /3m,ij["' + ^]]^ is the M xl 
vector containing the coefficients associated to all the activation shapes of the q-th channel centered 
around the {n + A;)-th sample, and A^min = [fs/{WL)\, with 1 < L < 4 indicating the decimation 
rate, is the minimum number of zero-valued coefficients required between two consecutive activations 
due to biological reasons. Note that the newly introduced regularization term, + ^]||o> is 

equal to the number of activations that violate the biological constraint, since the vector resulting from the 
product, l3m,q[n]Pq[n + k] = [I3m,q[n\/3i^q[n + k], . .., (3m,q['^](3M,q[n + k]]^ , will contain a non-zero term 
whenever a shape is active simultaneously at the n-th and (n + A;)-th sample for —N^nn < k < N^nm with 
A; / 0. Hence, the last regularization term penalizes violations of the biological constraints, and indeed, 
by letting — oo (or by taking a very large value in practice, i.e., pq » Xq and pq S> 1/{2N)) and 
choosing an appropriate value of Xq, this cost function solves exactly the same problem as the original 
cost function plus the post-processing stage. 

Unfortunately, the Lq "norm" is generally intractable, and the general approach taken is substituting 
it by the more tractable Li norm, which provides an equivalent solution under certain conditions (often 
difficult to check in practice). Performing this standard relaxation, the modified cost function given by 
( |3T] ) turns into the following cost functionf^ 



J.ppro. = ^\\zq-Af3^\\l + Xq\\f3q\\l+PqYYl Yl 1 1 /3m,<7 N/3, + ^] 1 1 1 • (32) 

n=l m=l fc=— A'',„i„ 

'^The Lq "norm" of a vector, ||x||o, is not really a norm, since it does not satify the triangle inequality. Hence, some authors 
refer to it as a counting function (see e.g. |Tropp and Wright] |2010| ). However, with a slight abuse of terminology, here we 
will refer to it as a "norm". 

"Note that, when moving from l |31| l to \32\ , we cannot integrate the two regularization terms into a single one, as the newly 
introduced term by itself does not lead to a sparse solution. 
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The additional regularization term in (32i can be expressed alternatively as 

N M Af„i„ 



R. 



approx 



^^g[n\(3g[n + k]\\l 



EE E 

n=l m=l A;=— Af„ii„ 

£ ||vec(/3,N/3j[n + A;])|h 

n=l k=-N„,i„ 
N 



J]||vec(/3,[n][/3j[n-iV^in], Pj[n-1], /3j[n + l], P'^[n + N, 



^min 1 



n=l 



|vec([/3,[l], (3g[2], Pg[N]]Bg) 



(33) 



(34) 



(35) 



(36) 



where vec(-) denotes the vectorization of a matrix (i.e., the column vector constructed by stacking all 



the elements of the matrix taken column by column), [/3q[l], /3g[2], 
and the N x IMN^i^ matrix Bg in the last equation is given by 



/3j[l-A^mm] 
/3j[iV-l-A^min] 



'M 



/3j[iV-l-iVmin] 



/3<[[3] 



PJN]] is an M X N matrix. 



Pj[N-2] 13'^ [N] 



/3 ' [2 + N^in] 



/3j[iV-l + iV„ 



/3j[2] 



/3j[2 + iV„ 



/3l[iV-iV^iJ 



(37) 



• P^[N-1] Ol ... 
where the second expression is obtained by noting that Pg[k] = Om for A; < 1 and k > N. Finally, 
we note that the vectorization of the product of a A; x £ matrix A and another i x m matrix B can be 
expressed as 



vec(AB) 



A)vec(B) = (B"^ (g)Ife)vec(A), 



(38) 



where Ip represents the px p identity matrix and (g) denotes the Kronecker product of two matrices [Van 



Loan 2000 1 . The Kronecker product of an m x n matrix A and apxq matrix B, results in the following 
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mp X nq matrix C: 



aiiB auB 
a2iB a22B 

flmiB am2B 



fflln.B 
a2nB 

^mnB 



(39) 



with Uij denoting the (z, j)-th element of matrix A. Applying ( [38] ) to the last expression of ([36]l, we 
finally get 



R. 



approx 



(l2Afiv„,„® [/3J1], /3J2], /3JiV]])vec(B 



q)\\l 



(B. 



(40) 



where we have used the fact that vec([/3g[l], /3g[2], . . . , (3q[N]]) = (3^ in the last expression, and the 
cost function that we want to minimize finally becomes 



J. 



1 2 T 

approx = ^ \\Zg - APg\\^ + XgWfBgWi + Pg\\{Bg <g)lAf)/3g||i. 



(41) 



Note that the structure of the new regularization term added, i?approx = Pgll(Bj(X'lM)/3g||i> is conditioned 



by the Kronecker product between B^ and the M x M identity matrix, Making use of p9\ , this 
product becomes 

' Bg{l,l)lM Bg{2,l)lM Bg{N,l)lM 

Bgil,2)lM 5,(2, 2)Ia/ ••• Bg{N,2)lM 



B' Im 



(42) 



Bg{l,L)lM Bg{2,L)lM ■■■ Bg{N,L)lM 
where Bq{i,j) denotes the (i,j)-th element of matrix B,, given by ( |37] ). Hence, BJ 1^/ provides 
us with a matrix composed of LN identity sub-matrices scaled by the corresponding coefficient of the 
sparse representation. When this matrix is multiplied by f3g we obtain all the products between coefficients 
associated to nearby activations, thus increasing the value of the cost function when two such nearby 
activations occur. 



B. Sparse Solution through Successive Convex Approximations 

Unfortunately, the new penalty term introduced in (32i leads to a non-convex optimization problem. 
However, in this subsection we present an algorithm, based on successive convex approximations (SCA) 
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I Chiang et al. 2007 Marks and Wright} 1978[ Avriel 19801, solving the constrained version of the 
Cross-Products LASSO. In particular, the problem to be solved can be formulated alternatively as 



minimize ||/3||i+c Tc 

/3,c 



(43) 



subject to \/3k\ = Ck, k = 1, . . . , MN 

l|y-*/3|| <^ 

where (3^ (resp. c^) is the k-th entry of (3 (resp c), ^ is some user-defined tolerable residual error, and 
the symmetric matrix F, with zeros along its main diagonal, penalizes the cross products of the absolute 
values of (3. That is, the entry 7^ £ > in the A;-th row and £-th column of F, induces a penalization 

Ik/akai = 7k,i\f3k\\l3i\- 



The optimization problem in ( [43] ) is difficult to solve, since the cost function is not convex whenever 
F 7^ 0. Moreover, the first set of constraints is not convex. However, taking into account that the cost 
function increases with Ck, we notice that this problem is equivalent to 



minimize 1 c + c Fc 

/3,c 



1, 



MN 



(44) 



subject to |/3yfc| < Ck, k 

l|y-*/3|| <C 

Let us now introduce the constraint 1 + 2Fc > 0, which ensures that, at the solution, the cost function 
increases with c^. Although this constraint is redundant at this point, it will become relevant soon. Thus, 
the optimization problem is 

minimize l^c + c^Fc 

/3,c 



(45) 



subject to <Ck, k = 1,..., MN 

l|y-*/3|| <C 

1 + 2Fc > 

where the main difficulty resides in the non-convex cost function. In order to deal with this difficulty and 
find a solution of the original Karush-Kuhn-Tucker (KKT) conditions |Boyd and Vandenberghe[ 2004 1, 
we apply the SCA methodology [Chiang et aL| |2007| [Marks and Wright| |1978| |Avriel[ |1980p . The main 
idea is replacing the non-convex functions by a sequence of local convex approximations, which must 
satisfy three conditions: 

1) The value of the original function, /(•), and its convex approximation, /(•), at the reference point 
xo should be the same, i.e., /(xq) = /(xq). 
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Algorithm 3 SCA for Cross-Products LASSO 
InputT, and y. 

Output: Recovered signal /3. 

Initialize cq = 0. 

Obtain the Matrices r+ and r_ from the EV of T 
repeat 



Solve the convex optimization problem in ( |46] l 
Update Co = c 
until Convergence 



2) The gradients at the reference point should coincide, i.e., V/(xo) = Vj(xo). 

3) The convex approximation must be an over-estimator of /(•), i.e., /(x) > /(x), Vx. 

In our particular case, given a reference value cq for the vector c, the cost function can be approximated by 
l^c + c^r+c + 2cQ r_(c — Co), where r+ and r_ are the positive semidefinite and negative semidefinite 
parts of r = r+ + r_. it is easy to check that this approximation satisfies the previous conditions, and 
therefore, the convex problem to be solved in each iteration of the proposed algorithm is finallyp^ 

minimize l^c + c^F+c + 2co r_ (c — co) 

/3,c 

subject to |/3fc| <Ck, k = I, . . . ,MN ^^^^ 

l|y-*/3|| <e 
1 + 2rc > 

The overall procedure is summarized in Algorithm [3j where the initial value for c (co = 0), reduces the 
cost function to the convex envelope of the original non-convex cost function. 

VII. Sparse Spectral Analysis 
A. Iterative Deflation Approach for Spectral Analysis 



Here we show the spectral analysis proposed on |Monz6n et al.[ 2012 1, which is based on applying 



an iterative deflation approach to the FFT of 7rq[n], extracting peaks with decreasing amplitudes up to 
a user defined threshold. Hence, since we apply the spectral analysis to the inferred sparse activation 

^"Note that the constraint 1 + 2rc > plays a crucial role in \46\ , ensuring that the first set of constraints is satisfied with 
equality |/3fe| = Ck. 
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sequence, we call our approach sparse spectral analysis (SSA). The number of peaks extracted (after 
the post-processing described in the following section) is an estimate of the number of existing foci and 
their locations provide us an estimate of their frequencies p] 

The first step in the SSA algorithm is segmenting 7rq[n] into J windows containing Ng = 4/^ samples 
(i.e. A = 4 s) without overlap, as done in DFA (see Section II-A| ). Then we apply Algorithm [4] to the FFT 



of each segment, n;^(/) = -F|7r^[n]| with 1 < j < J, after bandpass filtering. Algorithm |4j follows a 
deflation approach, searching iteratively for the highest peak of |ng(/)| within the frequency range that is 
physiologically interpretable (0.5 < /r < 2 Hz for sinus rythm and 2 < f,- < 10 Hz for AF) and adding it 
to the set of potential activation frequencies, tq . After each iteration we apply a second-order IIR digital 
notch filter to the signal centered around the detected frequency with bandwith i?3dB = 2/a = 0.5 Hz to 
eliminate the detected peak before searching for a new one. The algorithm stops when the highest peak 
detected is below a threshold, Tq = 7^ max |ng(/)|, being 7^ a user defined parameter. 

Algorithm 4 Iterative Spectral Analysis for the q-th signal, 
for j=l to J do 

Initialize = 7^ • max |nq(/)|. 

Initialize i = 1 and vr^ ^{t) = TTq{t) 

while max|nj .(/)| > F^ do 

1. Calculate the spectrum: .(/) = 

2. Obtain fg{i) = argmax|n;J 

3. Filter the signal: vr^_i+i(t) = vr^_i(i) * Kotch{t) 

4. i = i + 1 
end while 

end for 



Figure [T] shows an example of the spectrum obtained iteratively for a single segment. The activation 
sequence, 7rq[n], has been syntheticaly generated using R = 3 foci (with /i = 4 Hz, /2 = 6 Hz and 
/a = 7 Hz) and random phases. The highest peak for the amplitude spectrum in the first iteration (shown 
in black) is /i ^ 6.98 Hz, which corresponds to f^. Then we apply the notch filter centered around /i to 
the signal, obtaining the amplitude spectrum shown in blue, and detecting /2 ~ 5.96 Hz, which is close 
to /2. After a second notch filtering centered around /2, the third iteration (in green) detects /a ~ 4 Hz, 

^'The location of the highest peak (i.e. the first one extracted) provides us with the dominant frequency. 
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which corresponds to fi. After notch filtering again, iteration 4 (in red) detects w 7.99 Hz, which is 
the first harmonic of f^. Finally, after another notch filtering, all the peaks of the spectrum in the fifth 
iteration (in yellow) fall below the threshold Tg = 0.3 x max|ng(/)|. Hence, the algorithm concludes 
after obtaining 4 potential frequencies: tg = [6.98,5.96,4,7.99] Hz. 
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23456789 10 



f(Hz) 

Fig. 1. Example of the SSA for a single segment of 7rg[n]. 

B. Post-Processing: Discarding Harmonics 

The post-processing stage takes the set of potential activation frequencies detected inside each window, 
fq, and determines whether they belong to different activation foci or not applying the following steps: 

1) EUmination of repeated frequencies. Two frequencies, /i and /2, correspond to the same focus if 
l/i — /2I < /a- If this happens, the one associated to the smallest peak is deleted. 

2) Analysis of 2/3 frequency relationships. Due to the frequency range used in the analysis, given a 
single frequency, /o, in practice we can find at most two harmonics: fi = 2/o and /2 = 3/o. Thus, 
if we have detected the first and second harmonic of a given frequency, /o, their relationship will 
be /i = I/2. Here we check this relationship, keeping only the frequency associated to a higher 
amplitude in the spectrum when we find it. 
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3) Discovery of harmonics and subharmonics. Wlien two detected frequencies have a harmonic or 
subharmonic relationship, we only keep the one detected first in the spectral analysis and deleting 
the other. 

4) Discovery of cross-modulation frequencies. We analyze whether each new element in tg is a cross- 
modulation product of two previously detected frequencies, i.e. whether /s = ±m/i it n/2 for any 
two integers m and n. In this case /a will be deleted. 

With this analysis, we are able to estimate the number of activation foci present in our EGMs, Rg, as 
well as their frequencies, fg . Continuing with the example shown in Figure [ij the post-processing will 
find out that is the first harmonic of /s, deleting it and obtaining a correct final estimation of R = 3 
activation foci with frequencies iq = [6.98, 5.95, 4] Hz, which are quite close to the true ones. 

C. Alternative Spectral Analysis based on Eigen-Values 

We are currently considering alternative SSA approaches using methods based on eigen-values, such 
as ROOT MUSIC. 

VIII. Conclusions and Future Lines 
Contributions of the technical report: 

1) New more realistic mathematical model for EGM signals introduced based on latent signals (sparse 
activations or spike trains). 

2) Sparse reconstruction model based on an overcomplete dictionary proposed. 

3) Examples of dictionary construction based on Hermitian wavelets of order zero (energy-normaUzed 
Gaussians) and order one (Mexican hat wavelets). 

4) Indirect sparse solution of the problem using a LASSO regularization initially followed by a second 
stage to enforce the biological restrictions imposed by the refractory period of cardiac cells. 

5) Direct sparse solution of the problem using a new (non-convex) regularization term that we call 
cross-products LASSO (CP-LASSO). 

6) Successive convex approximations (SCA) approach introduced for solving the non-convex CP- 
LASSO optimization problem. 

Future lines: 

1) Perform many more simulations on real EGM data. 

2) Extend to the multi-channel case, probably using some type of Group LASSO formulation, although 
the way in which the groups are defined/learnt is unclear yet. 
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Appendix 

Derivation of the Discrete-Time Convolutional Model 

Since the sparse approximation is applied on the discrete-time difference sequence, Zq[n], the 
coefficients /Sm.gN will have the same support as this sequence, i.e., 1 < n < A^. Hence, /3m g[n] 
may be expressed as 

N 

PmM = -k\= /3™,,[n](u[n - 1] - u[n - {N + 1)]), (47) 

k=l 

where 5[n] denotes Kronecker's delta and u[n] Heaviside's unit step function. Regarding the elements of 
the dictionary, since the unknown input-output channels, hr^q[n], are assumed to be causal, here we will 
always consider causal discrete-time activations, Gm[n], typically obtained from a non-causal waveform, 
Gni{t) with support —Tm <t< Tm, through sampling and time-shifting. Thus, the support for Gm[n] 
will be < n < 2Nj^[ (with Nm = \Tm/Ts\, where [xj denotes the integer part of x, indicating the 
last non-zero element in the discrete-time waveform before time-shifting), and this sequence may be 
expressed as 

2Nm 

Gm[n] = Gm[mn -k] = G^[n]{u[n] - u[n - {2Nm + !)])• (48) 
Now we can formulate the convolution in (|7]) as 

oo 

PmM*Gm[n] = Prn,qmu[k - 1] - u[k - {N + l)])G^[n - k]{u[n -k]-u[n-k- {2Nm + 1)]). 

fc=— oo 

(49) 

In order to establish the limits for this convolution, we notice that 

(k — \ >{)^k>l 
(50) 
A;-(iV + l)<O^A;<iV + l^A;<A^, 



u[n-k]-u[n-k-{2NM+l)] 7^ 4^ < 



f 

n — k>0^k<n, 



n-k- {2Nm + l)<^^k>n- {2Nm + l)^k>n- 2Nm- 

(51) 



Therefore, the lower limit for the convolution will be 

kinf = sup{l, n - 2Nm] = 1 V (n - 2Nm), (52) 
whereas the upper limit will be 

ksup = inf{n, N} = n ^N = n. (53) 
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Inserting these limits in ( [491 ), we obtain 

n 

lim,q[n]*Gm[n\= ^ l3m,q[k\Gm[n - k]. (54) 

A:=lV(n-2AfM) 

Finally, by stating explicitly that Gm[?^] = for n > 2Nm and n < 0, we may remove the supremum 
from the lower limit and perform the sum from 1 up to A^, as is done in Q, although there will only be 
at most 2Nm + 1 non-zero terms in the sum. 

Alternatively, we can formulate the convolution in (|7]) as 

oo 

PmM*Gm[n\ = GnMHk] " u[k - {2Nm + l)\)Pm,q[n " k]{u[n -k-l]-u[n-k-{N + 1)]). 

(55) 

In order to establish the limits for this convolution, we notice again that 

> 0, 

u[k] - u[k - {2Nm + 1)] / 4^ <( (56) 

k - {2Nm + 1) < ^ /c < 2Nm + l^k< 2Nm, 



u[n-k-l]-u[n-k-{N + l)]^0^ 



n-k-l>0^k<n-l, 

n-k-{N + l)<0^k>n-{N + l)^k>n-N. 

(57) 



Therefore, now the lower limit for the convolution will be 

kinf = sup{0, n- N} = 0V {n- N) = 0, (58) 
whereas the upper limit will be 

ksup = mi{2NM, n-l} = {2Nm) A (n - 1). (59) 



Inserting these limits in ( |49| ), we obtain 

(2iVM)A(n-l) 



/3m,gN * = ^ Gm[k]l3m,q[n - k]. (60) 

fc=0 

Finally, by stating explicitly that = for n > 2Nm and I3m,q[n\ = for n < 1, we may remove 

the infimum from the upper limit and perform the sum from up to — 1, although there will only be 
at most 2Nm + 1 non-zero terms in the sum again. 

Derivation of the Hermitian Dictionaries 

In this section we derive the hermitian wavelet dictionaries used for the sparse reconstruction. First 
we obtain the normalization factor for the Gaussian function, which corresponds to the zero-th order 
Hermitian wavelet. Then we develop the normalized expressions for the first and second order Hermitian 
wavelets. Finally, we briefly discuss the general shape of the ^-th order Hermitian wavelet. 
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Zero-th Order Hermitian Wavelet (Energy-Normalized Gaussian) 

Let us denote the m-th standaid Gaussian function as 

1 / 



</)W(t) = ^_exp(-^). (61) 



This function corresponds to a proper and normaUzed probability density function (PDF), i.e. (p^m (t) > 

for — oo < t < oo, and 

cj>^^\t)dt = 1. (62) 



However, it is not normalized in energy, since 



oo 



oo 



— / exp (-\- ] dt 

^^"^m J-oo \ '^m 



E{tO} = ^ (63) 



is not equal to one unless we have am = l/[2\/7r]- I^i the sequel we will use E{/(t)} to denote the 
expectation of f{t) w.rt. a Gaussian centered around the origin with standard deviation a„il\Fi, i.e. 

E{/(t)} = r , ^^^^ exp f ^^^^ 



/(t) exp ^ dt. (64) 



Finally, making use of (61 1 and (63 1, the energy-normalized version of the Gaussian function will be 



^t\t) = = exp (--^ ) , (65) 

1(0) ^^Mi2 vrV4 \^ 20-4 
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which is precisely the expression given by (|9]l. 

First-Order Hermitian Wavelet 

The first-order hermitian wavelet is the negative normalized first derivative of the Gaussian function. 
Taking the first derivative of ( |6T] ) we obtain the following unnormalized function 

rJi^) = — = 77^^ exp -— ^ , (66) 



dt V2^al V 
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which has an energy 



\^g\t)\'dt 



oo 

1 



exp 



dt 



27r(T,6 



1 



(67) 



where we have used the fact that E{t^} = following the definition of the expectation operator 

provided by (|64]). Finally, making use of ( |66l ) and (67 1, the energy-normalized first-order hermitian wavelet 
will be 



-'m 



2t 



2 vrV4y2^ 



exp 



2al 



(68) 



Second-Order Hermitian Wavelet (Mexican Hat Wavelet) 

The second-order hermitian wavelet is the negative normalized second derivative of the Gaussian 
function. Taking the first derivative of (|66]l we obtain the following unnormalized function 



6(2) 



d^^ 
dt 



1 



which has an energy 

£;(2) 



6(2) 



exp 



2ct2 



(69) 



oo 

1 



a, 



exp 



at. 



dt 



E{t°} 



1 



1 



2 a. 



m 

^E{t'} + ^E{t'} 

2 



m 



+ 



1 3a: 



m 



-2 2 di 4 



8^/^^af^ 



(70) 



where we have used the fact that E{t^} = 3cr^/4, following the definition of the expectation operator 
provided by ( |64| . Finally, making use of ([69]l and ( [70] ), the energy-normalized second-order hermitian 
wavelet, also known as Mexican hat or Ricker wavelet, will be 



Cl>'n.\t) 



7ri/4^/3^ 



1 



t' 



at. 



exp 



2^2 



(71) 
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Higher-Order Hermitian Wavelets 

In general, the l-t\\ order Hermitian wavelet, 4>m{t) for ^ > 1, is obtained as the ^-th negative 
normalized derivative of the Gaussian function. In order to derive (pm (t) we follow the three-step 
procedure used for the first and second order wavelets: we calculate first the unnormalized derivative. 



■^^^^^ dt^ dt^-^ dt 

which can be obtained easily by applying the following recursion]^ 



then we obtain its energy. 



t)\^dt, 



and the £-th order normalized Hermitian wavelet is finally given by 



it) 



(72) 



(73) 



(74) 



(75) 



Following this procedure, we notice that the £-th order Hermitian wavelet can be expressed as 



exp 



(76) 



where Hi{x) denotes the ^-th order Hermite polynomial |Abramowitz and Stegun 1965 1 and Km is a 
normalization constant, obtained as 



K, 



exp 



2^2^ 



dt 



-1/2 



where the last expectation is as defined in ([64]). 




-1/2 



(77) 



^^The recursion in l |73[ l is valid for £ > 2 and can be easily proved by induction. Indeed, by defining ~ 0, it is valid 

even for 1=1. 
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